World Happiness Report Data Analysis Project¶

Ashley Bissell AD450 Final Project Winter 2025

Thesis Statement:¶

Exploring factors that have the strongest impact on world happiness scores, and how happiness differs between countries and across time.

Exploratory Data Analysis¶

Summarize the data¶

I used .info() to see all of the columns and data types

Happiness Dataset Info: 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4000 entries, 0 to 3999
Data columns (total 24 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Country                    4000 non-null   object 
 1   Year                       4000 non-null   int64  
 2   Happiness_Score            4000 non-null   float64
 3   GDP_per_Capita             4000 non-null   float64
 4   Social_Support             4000 non-null   float64
 5   Healthy_Life_Expectancy    4000 non-null   float64
 6   Freedom                    4000 non-null   float64
 7   Generosity                 4000 non-null   float64
 8   Corruption_Perception      4000 non-null   float64
 9   Unemployment_Rate          4000 non-null   float64
 10  Education_Index            4000 non-null   float64
 11  Population                 4000 non-null   int64  
 12  Urbanization_Rate          4000 non-null   float64
 13  Life_Satisfaction          4000 non-null   float64
 14  Public_Trust               4000 non-null   float64
 15  Mental_Health_Index        4000 non-null   float64
 16  Income_Inequality          4000 non-null   float64
 17  Public_Health_Expenditure  4000 non-null   float64
 18  Climate_Index              4000 non-null   float64
 19  Work_Life_Balance          4000 non-null   float64
 20  Internet_Access            4000 non-null   float64
 21  Crime_Rate                 4000 non-null   float64
 22  Political_Stability        4000 non-null   float64
 23  Employment_Rate            4000 non-null   float64
dtypes: float64(21), int64(2), object(1)
memory usage: 750.1+ KB

Get basic statistics¶

I used .describe() to see the ranges of the values and make sure data seems reasonable/real

Summary Statistics: 

Year Happiness_Score GDP_per_Capita Social_Support Healthy_Life_Expectancy Freedom Generosity Corruption_Perception Unemployment_Rate Education_Index ... Public_Trust Mental_Health_Index Income_Inequality Public_Health_Expenditure Climate_Index Work_Life_Balance Internet_Access Crime_Rate Political_Stability Employment_Rate
count 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 ... 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000
mean 2014.670750 5.455005 30482.009953 0.505860 67.917605 0.502723 0.143960 0.498920 10.966748 0.750385 ... 0.502812 69.976853 40.002648 6.009270 65.176380 5.987325 67.586327 45.526322 0.494105 74.021450
std 5.724075 1.427370 17216.122032 0.286202 10.172091 0.285219 0.200088 0.288866 5.210712 0.144819 ... 0.289186 17.128536 11.634987 2.291172 19.981357 1.725363 15.769023 20.300069 0.293191 13.906888
min 2005.000000 3.000000 1009.310000 0.000000 50.000000 0.000000 -0.200000 0.000000 2.000000 0.500000 ... 0.000000 40.000000 20.010000 2.010000 30.010000 3.000000 40.010000 10.030000 0.000000 50.000000
25% 2010.000000 4.237500 15425.125000 0.260000 59.177500 0.260000 -0.030000 0.240000 6.450000 0.630000 ... 0.260000 55.580000 29.865000 4.040000 48.170000 4.460000 53.910000 27.840000 0.230000 61.867500
50% 2015.000000 5.430000 29991.255000 0.510000 68.015000 0.500000 0.140000 0.500000 10.995000 0.750000 ... 0.500000 69.650000 40.015000 6.070000 64.755000 6.020000 68.015000 45.760000 0.490000 74.475000
75% 2020.000000 6.662500 45763.085000 0.750000 76.690000 0.750000 0.310000 0.742500 15.450000 0.880000 ... 0.760000 84.582500 50.187500 8.010000 82.652500 7.490000 81.332500 63.197500 0.760000 85.912500
max 2024.000000 8.000000 59980.720000 1.000000 85.000000 1.000000 0.500000 1.000000 19.990000 1.000000 ... 1.000000 100.000000 59.970000 10.000000 99.990000 9.000000 94.990000 79.990000 1.000000 98.000000

8 rows × 23 columns

Get value counts of a categorical column¶

I used .value_counts() to see how the data was spread across countries

Value Counts: 

Country
USA             429
France          415
Germany         413
Brazil          404
Australia       400
India           399
UK              395
Canada          386
South Africa    385
China           374
Name: count, dtype: int64

Get histograms of numeric columns¶

I used .hist() to see how each column's data is spread

Look for relationships in the data¶

I used scatter_matrix to see if there were any obvious connections between happiness_score and any other variables, or between subsets of variables (all on one scatter_matrix was too difficult to see due to large number of variables)

Data Cleaning and Transformation¶

Filling NaN values¶

I used .isna() to find missing values (there were none)

There is no missing data.

2. Correct data dtype issues¶

I used .to_datetime to convert year in int64 to year in datetime (later I have to reverse this)

New dtype for Year column:  datetime64[ns]

Data Joining¶

Merging two or more dataframes on a column¶

I created a dataframe of continents, and merged on country

Country Year Happiness_Score GDP_per_Capita Social_Support Healthy_Life_Expectancy Freedom Generosity Corruption_Perception Unemployment_Rate Education_Index Population Urbanization_Rate Life_Satisfaction Public_Trust Mental_Health_Index Income_Inequality Public_Health_Expenditure Climate_Index Work_Life_Balance Internet_Access Crime_Rate Political_Stability Employment_Rate Continent
0 China 2022-01-01 4.39 44984.68 0.53 71.11 0.41 -0.05 0.83 14.98 0.52 1311940760 78.71 8.88 0.34 76.44 46.06 8.92 62.75 8.59 74.40 70.30 0.29 61.38 Asia
1 UK 2015-01-01 5.49 30814.59 0.93 63.14 0.89 0.04 0.84 19.46 0.83 1194240877 50.87 5.03 0.72 53.38 46.43 4.43 53.11 8.76 91.74 73.32 0.76 80.18 Europe
2 Brazil 2009-01-01 4.65 39214.84 0.03 62.36 0.01 0.16 0.59 16.68 0.95 731100898 48.75 5.22 0.23 82.40 31.03 3.78 33.30 6.06 71.80 28.99 0.94 72.65 South America
3 France 2019-01-01 5.20 30655.75 0.77 78.94 0.98 0.25 0.63 2.64 0.70 1293957314 81.78 5.69 0.68 46.87 57.65 4.43 90.59 6.36 86.16 45.76 0.48 55.14 Europe
4 China 2022-01-01 7.28 30016.87 0.05 50.33 0.62 0.18 0.92 7.70 0.92 1432971455 82.39 6.33 0.50 60.38 28.54 7.66 59.33 3.00 71.10 65.67 0.12 51.55 Asia

Data Visualization¶

Asking thoughtful analytical questions¶

Questions:

  1. Has the USA happiness score trended (upward or downward) between 2005 and 2024?
  2. What factors correlate most strongly with happiness_score for USA?
  3. Is the correlation between Employment_Rate and Happiness statistically significant?
  4. What might be causing the downward trend in Happiness_Score since 2021?
  5. Is the correlation of Happiness Score with healthy life expectancy from 2021-2024 for the USA statistically significant?
Happiness Score vs Year Linear Regression Fit: 
Slope: 0.001
R-squared: 0.000
P-value: 0.928
Happiness Score vs Employment Rate Linear Regression Fit: 
Slope: 0.010
R-squared: 0.008
P-value: 0.064
Happiness Score vs Healthy Life Expectancy- USA 2021-2024 Linear Regression Fit: 
Slope: 0.046
R-squared: 0.087
P-value: 0.004

Aggregation and Grouping Operations¶

Perform an aggregation on a groupby¶

I grouped by Country and Year, then calculated the mean for all numeric columns

Country Year Happiness_Score GDP_per_Capita Social_Support Healthy_Life_Expectancy Freedom Generosity Corruption_Perception Unemployment_Rate Education_Index Population Urbanization_Rate Life_Satisfaction Public_Trust Mental_Health_Index Income_Inequality Public_Health_Expenditure Climate_Index Work_Life_Balance Internet_Access Crime_Rate Political_Stability Employment_Rate Year_num
0 Australia 2005-01-01 5.430667 28886.978000 0.528667 68.128667 0.372000 0.181333 0.517333 11.041333 0.778000 7.037067e+08 59.478667 5.685333 0.512667 74.255333 41.165333 5.525333 64.565333 6.286667 71.750667 44.341333 0.552667 72.108667 2005.0
1 Australia 2006-01-01 6.030556 28936.431111 0.677222 63.465000 0.495000 0.100556 0.449444 8.176111 0.736667 7.845685e+08 61.441111 6.388333 0.481111 75.430000 36.857222 5.838333 69.659444 5.902222 67.400556 47.636111 0.516111 73.490556 2006.0
2 Australia 2007-01-01 5.600000 36317.625455 0.505000 65.911818 0.529545 0.167273 0.487273 9.251818 0.780455 6.854900e+08 64.544545 6.788636 0.530000 71.362727 38.910000 6.015000 66.739091 5.937273 68.869545 46.938636 0.564545 73.953636 2007.0
3 Australia 2008-01-01 5.868889 27944.319444 0.467778 69.021111 0.616667 0.170000 0.480556 13.337222 0.696667 8.398988e+08 64.130556 6.635556 0.530000 60.281667 33.376667 6.148333 59.651111 5.455000 73.648889 47.990556 0.658333 69.658333 2008.0
4 Australia 2009-01-01 5.182857 29764.803810 0.524762 69.346667 0.574762 0.088571 0.549048 11.704762 0.756667 6.325921e+08 60.689524 6.563333 0.436667 66.839048 47.194286 6.183810 55.751905 6.234286 69.601905 37.245238 0.486190 73.119048 2009.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
195 USA 2020-01-01 5.271667 28603.595417 0.494167 68.694167 0.533333 0.162083 0.441667 13.001667 0.792500 7.466417e+08 59.628333 6.985000 0.479167 68.115833 37.422083 5.592083 55.560000 6.275417 71.215417 44.359583 0.460833 76.734167 2020.0
196 USA 2021-01-01 6.193478 27768.738261 0.629565 67.822609 0.413913 0.157391 0.520000 12.670870 0.706522 8.094940e+08 55.241739 6.189565 0.471304 76.129565 43.584783 5.960435 67.486087 6.153478 65.183478 52.734348 0.489565 77.133478 2021.0
197 USA 2022-01-01 5.830370 32318.473704 0.459630 69.417778 0.566296 0.140000 0.521481 9.208889 0.743704 7.504774e+08 62.747778 6.675556 0.428148 78.861852 42.779630 5.602963 64.268889 6.167037 61.077407 38.973333 0.495556 71.130741 2022.0
198 USA 2023-01-01 5.465714 34045.116667 0.534286 61.773333 0.560476 0.168571 0.503810 11.015238 0.729524 7.384771e+08 56.519048 6.005714 0.464762 66.837619 41.858095 6.668095 60.468095 6.246190 65.696667 42.794762 0.423333 72.164286 2023.0
199 USA 2024-01-01 5.072857 34432.532857 0.480952 67.951429 0.449524 0.091429 0.662857 10.604762 0.741905 8.471548e+08 63.304286 6.712857 0.440476 69.573810 36.480952 7.103333 60.900000 5.776190 71.822857 42.460952 0.519524 73.016667 2024.0

200 rows × 25 columns

Further visualization of aggregated data¶

Conclusions¶

There are multiple data points per country per year in this data set, with no overlap of any columns. It is possible that this data is an amalgamation of other data sets, or that there is a missing column that would explain why there are multiple points within a year (such as different groups being surveyed, different regions within a country, or different time points within a year). There are no clear overall correlations between any of the columns and happiness score. If the data is analyzed for the USA in just the years 2021-2024, there is a slight correlation between healthy life expectancy and happiness score, with both going down together over that time period. From this data, overall happiness over the period 2005-2024 is highest for Australia, lowest for the UK, and the USA ranks 3rd in this group for overall happiness score.